Skip to content

Comments

Improved CRD discovery to automatically scan repositories#886

Open
CWAbhi wants to merge 8 commits intomeshery:masterfrom
CWAbhi:new_branch
Open

Improved CRD discovery to automatically scan repositories#886
CWAbhi wants to merge 8 commits intomeshery:masterfrom
CWAbhi:new_branch

Conversation

@CWAbhi
Copy link

@CWAbhi CWAbhi commented Jan 3, 2026

Description
This PR fixes #881

Notes for Reviewers
This PR enhances the CRD discovery mechanism by allowing users to specify a repository URL as the source instead of a static directory path. The generator will automatically discover CRDs within the repository, regardless of their location in the file structure.

Implementation Details
Introduces a more robust CRD discovery strategy inspired by the logic used in crdsdev/doc.
Replaces static directory assumptions with recursive repository scanning and pattern-based CRD detection.
Improves resilience against repository structure changes without impacting existing functionality.

Benefits
Reduces configuration complexity for users.
Makes CRD discovery resilient to repository refactors.
Improves usability and long-term maintainability of the model generator.

Example Usage
Source: https://github.com/crossplane/crossplane
The generator automatically locates CRDs without requiring users to specify internal paths.

Signed commits

  • [ ✅] Yes, I signed my commits.

@welcome
Copy link

welcome bot commented Jan 3, 2026

Yay, your first pull request! 👍 A contributor will be by to give feedback soon. In the meantime, you can find updates in the #github-notifications channel in the community Slack.
Be sure to double-check that you have signed your commits. Here are instructions for making signing an implicit activity while performing a commit.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @CWAbhi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the CRD discovery mechanism by introducing automatic repository scanning for Custom Resource Definitions. Instead of requiring precise file paths, users can now provide a GitHub repository URL, and the system will intelligently locate CRDs, improving usability and making the process more resilient to repository structure changes. This change streamlines configuration and ensures long-term maintainability of the model generator.

Highlights

  • Automatic CRD Discovery: The system can now automatically discover Custom Resource Definitions (CRDs) within a specified GitHub repository URL, eliminating the need for users to provide exact file paths.
  • Flexible GitHub URL Parsing: The GitRepo now supports a wider range of GitHub URL formats, including those with '/tree/branch' and '/blob/branch' paths, and defaults to the 'main' branch if not specified.
  • CRD-Aware File Interception: A new 'crdAwareFileInterceptor' intelligently scans only YAML/JSON files for actual CRD content, skipping irrelevant files and handling multi-document YAML.
  • Enhanced GitRepo Integration: Standard GitHub URLs are now explicitly routed to the GitRepo downloader, ensuring the new discovery mechanism is utilized for all GitHub-hosted repositories.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
@CWAbhi
Copy link
Author

CWAbhi commented Jan 3, 2026

@leecalcote can you please check the PR and review it

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the GitHub generator to automatically discover Custom Resource Definitions (CRDs) within repositories by refactoring URL parsing to support various GitHub URL formats and introducing a crdAwareFileInterceptor for recursive traversal and identification of CRD-containing YAML/JSON files. Review comments indicate that the CRD detection logic, particularly the string-based checks for kind: CustomResourceDefinition and the multi-document YAML splitting, is fragile and complex, suggesting improvements like using regular expressions for robustness. Additionally, a minor refactoring is recommended to remove an unused scheme parameter from the isGitHubURL helper function.

Comment on lines 221 to 230
if strings.Contains(doc, "kind: CustomResourceDefinition") {
hasCRD = true
break
}
// Check for JSON format
if strings.Contains(doc, "\"kind\":\"CustomResourceDefinition\"") ||
strings.Contains(doc, `"kind":"CustomResourceDefinition"`) {
hasCRD = true
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current string-based checks for kind: CustomResourceDefinition are fragile and may fail with minor formatting variations, such as extra whitespace. Additionally, the check for JSON on lines 226-227 contains a redundant condition, as both sides of the || operator are identical.

To improve robustness, consider using regular expressions to handle whitespace variations. For example:

// Note: This requires importing the `regexp` package.

// For YAML
if match, _ := regexp.MatchString(`kind:\s*CustomResourceDefinition`, doc); match {
    hasCRD = true
    break
}

// For JSON
if match, _ := regexp.MatchString(`"kind"\s*:\s*"CustomResourceDefinition"`, doc); match {
    hasCRD = true
    break
}

Alternatively, a full unmarshal would be even more robust, though potentially slower.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider these suggestions @CWAbhi

Comment on lines +187 to +211
documents := strings.Split(content, "\n---\n")
// Also handle documents separated by "---" at the start of a line
if len(documents) == 1 {
// Try splitting by lines starting with "---"
lines := strings.Split(content, "\n")
var docs []string
var currentDoc strings.Builder
for _, line := range lines {
if strings.TrimSpace(line) == "---" && currentDoc.Len() > 0 {
docs = append(docs, currentDoc.String())
currentDoc.Reset()
} else {
if currentDoc.Len() > 0 {
currentDoc.WriteString("\n")
}
currentDoc.WriteString(line)
}
}
if currentDoc.Len() > 0 {
docs = append(docs, currentDoc.String())
}
if len(docs) > 1 {
documents = docs
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for splitting multi-document YAML files is complex and has a few issues:

  1. The initial split strings.Split(content, "\n---\n") on line 187 is not robust for all multi-document YAML files.
  2. In the fallback logic, the condition && currentDoc.Len() > 0 on line 195 can cause the --- separator at the beginning of a file to be incorrectly included in the first document.
  3. The else block on line 198 will append the --- line to the next document if the if condition on line 195 is false.
  4. On line 208, if len(docs) > 1 prevents processing files with a single document that might be discovered by this splitting logic.

Consider simplifying this entire block with a more robust method, such as using a regular expression to split documents, which would be cleaner and less error-prone.


// isGitHubURL checks if the URL is a GitHub repository URL
// This enables automatic CRD discovery for standard GitHub URLs
func isGitHubURL(scheme string, url *url.URL) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The scheme parameter is unused within this function. Removing it would improve code clarity. Remember to update the call to this function on line 16 as well.

Suggested change
func isGitHubURL(scheme string, url *url.URL) bool {
func isGitHubURL(url *url.URL) bool {

@leecalcote
Copy link
Member

leecalcote commented Jan 7, 2026

This is a good item to add to the weekly Meshery Dev meeting agenda. You can add this item in the doc, attend, and present it. Meeting details at https://meshery.io/calendar.

@CWAbhi
Copy link
Author

CWAbhi commented Jan 7, 2026

@leecalcote Sounds good . I’ll add this to the agenda and join the meeting to present it. Thanks for the direction!

Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
@CWAbhi
Copy link
Author

CWAbhi commented Jan 9, 2026

@leecalcote i have fixed the version issue kindly review it.

if doc == "" {
continue
}
// Check for YAML format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use libraries, not string parsing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay noted !

repo = parts[1]

// Remove .git suffix from repo name if present
repo = strings.TrimSuffix(repo, ".git")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find a library that offers this functionality.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i will look for a library that offers this and update it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i will check this out @parthivsaikia .

Comment on lines 221 to 230
if strings.Contains(doc, "kind: CustomResourceDefinition") {
hasCRD = true
break
}
// Check for JSON format
if strings.Contains(doc, "\"kind\":\"CustomResourceDefinition\"") ||
strings.Contains(doc, `"kind":"CustomResourceDefinition"`) {
hasCRD = true
break
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider these suggestions @CWAbhi


// isGitHubURL checks if the URL is a GitHub repository URL
// This enables automatic CRD discovery for standard GitHub URLs
func isGitHubURL(scheme string, url *url.URL) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a library instead of manual parsing would be more robust.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i will change from manual parsing of this to a library

Comment on lines -6 to -13
linters-settings:
staticcheck:
go: "1.25"
checks: ["all", "-ST1005"]

issues:
exclude:
- "ST1005" No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider reverting these changes since they are not focused on the specific issue and also they have been taken care in another PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i ll revert the latest commit

Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
@CWAbhi CWAbhi requested a review from leecalcote January 16, 2026 05:09
@CWAbhi CWAbhi requested a review from parthivsaikia January 17, 2026 10:26
@parthivsaikia
Copy link
Member

@CWAbhi have you addressed the changes requested by gemini? And also I cannot see any usage of libraries as requested.

@CWAbhi
Copy link
Author

CWAbhi commented Jan 18, 2026

@parthivsaikia i have not addressed the changes requested by gemini . but i have used the libraries as requested i have used different library but ill check it once again today and address the changes requested by gemini too and commit the changes.

Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
@CWAbhi
Copy link
Author

CWAbhi commented Jan 19, 2026

@parthivsaikia can you check it once i have fixed the gemini and uses git parser library.

@CWAbhi
Copy link
Author

CWAbhi commented Feb 6, 2026

@leecalcote @parthivsaikia any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Auto-discover CRDs from repository root URL

3 participants